A danish text-to-speech system using a text normalizer based on morph analysis

نویسندگان

  • Björn Granström
  • Peter Molbaek Hansen
  • Nina Gronnum Thorsen
چکیده

A Nordic c o o p e r a t i v e p r o j e c t h a s been s t a r t e d t o d e v e l o p a tex t to speech d e v i c e f o r t h e Nordic languages. The development is based on t h e sys tem o r i g i n a l l y c r e a t e d i n Stockholm. Language s p e c i f i c f e a t u r e s have n e c e s s i t a t e d m o d i f i c a t i o n s o f t h e o r i g i n a l s t r u c t u r e . For Danish, t h i s p r i m a r i l y i n v o l v e s t h e i n c l u s i o n o f a morph b a s e d " t e x t n o r m a l i z i n g component". T h i s p a p e r p r e s e n t s t h e c o n s t r u c t i o n a n d f u n c t i o n o f t h e sys tem and a l s o d i s c u s s e s some p r e l i m i n a r y use o f t h e device . I n t r o d u c t i o n Speech s y n t h e s i s h a s b e e n a major l i n e o f r e s e a r c h i n o u r t w o d e p a r t m e n t s f o r s e v e r a l decades. I n Sweden, t h i s e f f o r t h a s r e s u l t e d i n a m u l t i l i n g u a l taxt to-speech sys tem (Car l son & Granstrom, 1986) , comm e r c i a l l y a v a i l a b l e t h r o u g h I n f o v o x AB. A j o i n t e f f o r t w i t h i n t h e p r o j e c t "A Nordic textto-speech system", f inanced by The Nordic Committee on D i s a b i l i t y , is a i m e d a t making t h i s d e v i c e a v a i l a b l e t o t h e handicapped i n t h e Nordic c o u n t r i e s . A l t h o q h t h e N o r d i c l a n g u a g e s are m u t u a l l y i n t e l l i g i b l e , D a n i s h p o s e s some s p e c i a l p r o b l e m s f o r a t e x t t o s p e e c h s y s t e m b e c a u s e t h e r e l a t i o n b e t w e e n t h e s t a n d a r d o r t o g r a p h y and p r o n u n c i a t i o n is r a t h e r complicated. To t a c k l e t h i s , w e have inc luded a unique component i n t h e sys tem t h a t t r a n s f o r m s words i n t o an i d e a l i z e d normal ized or thography. T h i s is accomplished through a morphological a n a l y s i s based on a set o f modera te ly l a r g e morph l e x i c a . With a l i m i t e d set o f r u l e s , t h e r e s u l t is t r a n s f o r m e d t o a p h o n e t i c t r a n s c r i p t i o n , i n c l u d i n g stress. I n a p h o n e t i c r u l e s c o m p o n e n t , s p e c i a l care h a s b e e n t a k e n t o r e a l i z e t h e p r o s o d i c s t r u c t u r e o f Danish which d i f f e r s c o n s i d e r a b l y from s t a n d a r d Swedish o r Norwegian. There are a l s o many o t h e r d i f f e r e n c e s i n s t r u c t u r e s u c h a s t h e a m p l e u s e o f " s t $ d " , a k i n d o f c r e a k y v o i c e unknown i n t h e o t h e r Nordic l anguages b u t which c o r r e s p o n d s r o u g h l y t o t h e t o n a l word a c c e n t 1 i n Swedish dnd Norwegian.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cipher text only attack on speech time scrambling systems using correction of audio spectrogram

Recently permutation multimedia ciphers were broken in a chosen-plaintext scenario. That attack models a very resourceful adversary which may not always be the case. To show insecurity of these ciphers, we present a cipher-text only attack on speech permutation ciphers. We show inherent redundancies of speech can pave the path for a successful cipher-text only attack. To that end, regularities ...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS

Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...

متن کامل

DIXI - portuguese text-to-speech system

This paper describes the software architecture of the Portuguese text-to-speech system DIXI. The system has three major modules. The rst one contains the text normalizer and searches each word in the lexicon. The second one is a multi-level rule based module for lexical stress assignment, orthographic to phonetic transcription, metrically based prosodic patterning and for generating the evoluti...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

L2 Learners’ Lexical Inferencing: Perceptual Learning Style Preferences, Strategy Use, Density of Text, and Parts of Speech as Possible Predictors

This study was intended first to categorize the L2 learners in terms of their learning style preferences and second to investigate if their learning preferences are related to lexical inferencing. Moreover, strategies used for lexical inferencing and text related issues of text density and parts of speech were studied to determine their moderating effects and the best predictors of lexical infe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1987